NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Covariance loss, Szemeredi regularity, and differential privacy

https://doi.org/10.1090/proc/17126

Boedihardjo, March; Strohmer, Thomas; Vershynin, Roman (February 2025, Proceedings of the American Mathematical Society)

We show how randomized rounding based on Grothendieck’s identity can be used to prove a nearly tight bound on the covariance loss–the amount of covariance that is lost by taking conditional expectation. This result yields a new type of weak Szemeredi regularity lemma for positive semidefinite matrices and kernels. Moreover, it can be used to construct differentially private synthetic data.
more » « less
Free, publicly-accessible full text available February 1, 2026
Private measures, random walks, and synthetic data

Boedihardjo, March; Strohmer, Thomas; Vershynin, Roman (April 2024, Probability theory and related fields)

Full Text Available
Private measures, random walks, and synthetic data

https://doi.org/10.1007/s00440-024-01279-z

Boedihardjo, March; Strohmer, Thomas; Vershynin, Roman (April 2024, Probability Theory and Related Fields)

Abstract Differential privacy is a mathematical concept that provides an information-theoretic security guarantee. While differential privacy has emerged as a de facto standard for guaranteeing privacy in data sharing, the known mechanisms to achieve it come with some serious limitations. Utility guarantees are usually provided only for a fixed, a priori specified set of queries. Moreover, there are no utility guarantees for more complex—but very common—machine learning tasks such as clustering or classification. In this paper we overcome some of these limitations. Working with metric privacy, a powerful generalization of differential privacy, we develop a polynomial-time algorithm that creates aprivate measurefrom a data set. This private measure allows us to efficiently construct private synthetic data that are accurate for a wide range of statistical analysis tools. Moreover, we prove an asymptotically sharp min-max result for private measures and synthetic data in general compact metric spaces, for any fixed privacy budget$$\varepsilon $$ $ε$ bounded away from zero. A key ingredient in our construction is a newsuperregular random walk, whose joint distribution of steps is as regular as that of independent random variables, yet which deviates from the origin logarithmically slowly.
more » « less
Covariance's Loss is Privacy's Gain: Computationally Efficient, Private and Accurate Synthetic Data

Boedihardjo, March; Strohmer, Thomas; Vershynin, Roman (February 2024, Foundations of Computational Mathematics)

Full Text Available
Privacy of Synthetic Data: A Statistical Framework

https://doi.org/10.1109/TIT.2022.3216793

Boedihardjo, March; Strohmer, Thomas; Vershynin, Roman (January 2023, IEEE Transactions on Information Theory)

Full Text Available
Matrix concentration inequalities and free probability

https://doi.org/10.1007/s00222-023-01204-6

Bandeira, Afonso S.; Boedihardjo, March T.; van Handel, Ramon (January 2023, Inventiones mathematicae)

Full Text Available
Privacy of synthetic data: A statistical framework

Boedihardjo, March; Strohmer, Thomas; Vershynin, Roman (October 2022, IEEE Transactions on Information Theory)

Full Text Available
Private Sampling: A Noiseless Approach for Generating Differentially Private Synthetic Data

https://doi.org/10.1137/21M1449944

Boedihardjo, March; Strohmer, Thomas; Vershynin, Roman (September 2022, SIAM Journal on Mathematics of Data Science)

Full Text Available
Private sampling: a noiseless approach for generating differentially private synthetic data

Boedihardjo, March; Strohmer, Thomas; Vershynin, Roman (April 2022, SIAM journal on mathematics of data science)

Full Text Available
Covariance’s Loss is Privacy’s Gain: Computationally Efficient, Private and Accurate Synthetic Data

https://doi.org/10.1007/s10208-022-09591-7

Boedihardjo, March; Strohmer, Thomas; Vershynin, Roman (January 2022, Foundations of Computational Mathematics)

Abstract The protection of private information is of vital importance in data-driven research, business and government. The conflict between privacy and utility has triggered intensive research in the computer science and statistics communities, who have developed a variety of methods for privacy-preserving data release. Among the main concepts that have emerged are anonymity and differential privacy. Today, another solution is gaining traction, synthetic data. However, the road to privacy is paved with NP-hard problems. In this paper, we focus on the NP-hard challenge to develop a synthetic data generation method that is computationally efficient, comes with provable privacy guarantees and rigorously quantifies data utility. We solve a relaxed version of this problem by studying a fundamental, but a first glance completely unrelated, problem in probability concerning the concept of covariance loss. Namely, we find a nearly optimal and constructive answer to the question how much information is lost when we take conditional expectation. Surprisingly, this excursion into theoretical probability produces mathematical techniques that allow us to derive constructive, approximately optimal solutions to difficult applied problems concerning microaggregation, privacy and synthetic data.
more » « less
Full Text Available

« Prev Next »

Search for: All records